Skip to content

Add specific circuit failure error codes. Fixes #3717, #3543, #3364, #2888, #2859, #1580#3792

Merged
plorenz merged 4 commits intomainfrom
specific-router-error-types
Apr 14, 2026
Merged

Add specific circuit failure error codes. Fixes #3717, #3543, #3364, #2888, #2859, #1580#3792
plorenz merged 4 commits intomainfrom
specific-router-error-types

Conversation

@plorenz
Copy link
Copy Markdown
Member

@plorenz plorenz commented Apr 9, 2026

  • adds ErrorType constants for rejected-by-application, DNS resolution failed,
    port not allowed, invalid link destination, and resources not available
  • adds corresponding CircuitFailureCause strings reported in circuit events
  • extracts classifyDialError() in route handler to map dial errors to specific
    error codes using typed errors, syscall constants, and string matching
  • detects DNS errors via *net.DNSError and string fallback for ER/T hosted
    services where errors are serialized through the SDK message protocol
  • detects resource exhaustion via EMFILE, ENFILE, ENOBUFS syscall errors
  • introduces InvalidLinkDestinationError typed error in forwarder package
  • adds unit tests covering all 16 classification cases
  • adds integration tests for rejected-by-application (SDK host),
    DNS resolution failed, connection refused, and port not allowed
    (ER/T host mode) with circuit event verification
  • adds CreateEnrollAndStartTunnelerEdgeRouterWithCfgTweaks to test context

@plorenz plorenz requested review from a team as code owners April 9, 2026 15:08
@plorenz plorenz force-pushed the specific-router-error-types branch from 6f7f18c to 9fb5773 Compare April 9, 2026 20:37
Comment on lines 238 to 241
func isNetworkTimeout(err error) bool {
var netErr net.Error
return errors.As(err, &netErr)
}
Copy link
Copy Markdown
Member

@andrewpmartinez andrewpmartinez Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function wrong? This is any net error, and it looks like an ordered switch statement is used to help this function, when I think we can just augment this function:

func isNetworkTimeout(err error) bool {
    var netErr net.Error
    return errors.As(err, &netErr) && netErr.Timeout()
}

It looks like this entire implementation is fine, but the ordering is fragile. I think this would help reduce fragility.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, good catch

Comment on lines +191 to +192
case ctrl_msg.ErrorTypeInvalidLinkDestination:
failureCause = CircuitFailureRouterErrInvalidLinkDest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this one case doesn't call self.serviceCounters.ServiceDialOtherError()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's not a dial error, it's a route failure caused by a link going away, so it doesn't count as a service dial error. However, I think semantically we're using ServiceDial to be the circuit dial, not the dial on the router, so it probably should update that. fixing.

Comment on lines +63 to +66
t.Run("timeout", func(t *testing.T) {
err := fmt.Errorf("dial failed: %w", syscall.ETIMEDOUT)
require.Equal(t, byte(ctrl_msg.ErrorTypeDialTimedOut), classifyDialError(err))
})
Copy link
Copy Markdown
Member

@andrewpmartinez andrewpmartinez Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the *net.Op variant which is what go actually throws in most paths.

*net.OpError wrapping syscall.ETIMEDOUT.

Example: &net.OpError{Err: &os.SyscallError{Err: syscall.ETIMEDOUT}}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test for this

Copy link
Copy Markdown
Member

@andrewpmartinez andrewpmartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvements suggested.

plorenz added 2 commits April 14, 2026 15:52
…ixes #3364, fixes #2888, fixes #2859, fixes #1580

- adds ErrorType constants for rejected-by-application, DNS resolution failed,
  port not allowed, invalid link destination, and resources not available
- adds corresponding CircuitFailureCause strings reported in circuit events
- extracts classifyDialError() in route handler to map dial errors to specific
  error codes using typed errors, syscall constants, and string matching
- detects DNS errors via *net.DNSError and string fallback for ER/T hosted
  services where errors are serialized through the SDK message protocol
- detects resource exhaustion via EMFILE, ENFILE, ENOBUFS syscall errors
- introduces InvalidLinkDestinationError typed error in forwarder package
- adds unit tests covering all 16 classification cases
- adds integration tests for rejected-by-application (SDK host),
  DNS resolution failed, connection refused, and port not allowed
  (ER/T host mode) with circuit event verification
- adds CreateEnrollAndStartTunnelerEdgeRouterWithCfgTweaks to test context
@plorenz plorenz force-pushed the specific-router-error-types branch from 0cde94e to 3a31023 Compare April 14, 2026 19:55
@plorenz plorenz force-pushed the specific-router-error-types branch from 3a31023 to 90f32af Compare April 14, 2026 20:09
@plorenz plorenz merged commit 596e018 into main Apr 14, 2026
38 checks passed
@plorenz plorenz deleted the specific-router-error-types branch April 14, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants